Overview

Dataset Statistics

Number of Variables 19
Number of Rows 61345
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 11.1 MB
Average Row Size in Memory 188.9 B
Variable Types
  • Numerical: 11
  • Categorical: 8

Dataset Insights

Soil_pH is normally distributed Normal
Water_Quality has constant length 1 Constant Length
Ecological_Health_Label_Ecologically Critical has constant length 1 Constant Length
Ecological_Health_Label_Ecologically Degraded has constant length 1 Constant Length
Ecological_Health_Label_Ecologically Healthy has constant length 1 Constant Length
Ecological_Health_Label_Ecologically Stable has constant length 1 Constant Length
Ecological_Health_Label_encoded has constant length 1 Constant Length

Variables


PM2.5

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 100.1129
Minimum 0.2762
Maximum 755.2852
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PM2.5 is skewed right (γ1 = 1.4371)

Quantile Statistics

Minimum 0.2762
5-th Percentile 18.0193
Q1 47.9836
Median 83.9433
Q3 134.9439
95-th Percentile 237.6456
Maximum 755.2852
Range 755.009
IQR 86.9603

Descriptive Statistics

Mean 100.1129
Standard Deviation 70.9758
Variance 5037.5591
Sum 6.1414e+06
Skewness 1.4371
Kurtosis 3.1825
Coefficient of Variation 0.709
  • PM2.5 is not normally distributed (p-value 3.900228030760396e-05)
  • PM2.5 has 1876 outliers

Temperature

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 20.0325
Minimum -0.4042
Maximum 41.4914
Zeros 0
Zeros (%) 0.0%
Negatives 1
Negatives (%) 0.0%
  • Temperature is skewed right (γ1 = 0.0164)

Quantile Statistics

Minimum -0.4042
5-th Percentile 11.8747
Q1 16.6754
Median 20.0156
Q3 23.3667
95-th Percentile 28.2728
Maximum 41.4914
Range 41.8956
IQR 6.6914

Descriptive Statistics

Mean 20.0325
Standard Deviation 4.9915
Variance 24.9154
Sum 1.2289e+06
Skewness 0.0164
Kurtosis 0.02829
Coefficient of Variation 0.2492
  • Temperature has 477 outliers

Humidity

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 65.1285
Minimum 30.0002
Maximum 99.9951
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Humidity is skewed left (γ1 = -0.0049)

Quantile Statistics

Minimum 30.0002
5-th Percentile 33.5963
Q1 47.6996
Median 65.1278
Q3 82.5951
95-th Percentile 96.6091
Maximum 99.9951
Range 69.9949
IQR 34.8956

Descriptive Statistics

Mean 65.1285
Standard Deviation 20.2006
Variance 408.0649
Sum 3.9953e+06
Skewness -0.004948
Kurtosis -1.1938
Coefficient of Variation 0.3102

Soil_Moisture

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 28.6658
Minimum 0.09745
Maximum 93.435
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Soil_Moisture is skewed right (γ1 = 0.5955)

Quantile Statistics

Minimum 0.09745
5-th Percentile 6.3401
Q1 16.1679
Median 26.6268
Q3 39.0565
95-th Percentile 58.3574
Maximum 93.435
Range 93.3376
IQR 22.8886

Descriptive Statistics

Mean 28.6658
Standard Deviation 16.0103
Variance 256.331
Sum 1.7585e+06
Skewness 0.5955
Kurtosis -0.1117
Coefficient of Variation 0.5585
  • Soil_Moisture is not normally distributed (p-value 5.19883252492024e-10)
  • Soil_Moisture has 398 outliers

Biodiversity_Index

numerical

Approximate Distinct Count 27
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 9.9999
Minimum 0
Maximum 26
Zeros 4
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Biodiversity_Index is skewed right (γ1 = 0.3378)

Quantile Statistics

Minimum 0
5-th Percentile 5
Q1 8
Median 10
Q3 12
95-th Percentile 15
Maximum 26
Range 26
IQR 4

Descriptive Statistics

Mean 9.9999
Standard Deviation 3.157
Variance 9.9669
Sum 613445
Skewness 0.3378
Kurtosis 0.1416
Coefficient of Variation 0.3157
  • Biodiversity_Index is not normally distributed (p-value 5.70007932871499e-06)
  • Biodiversity_Index has 502 outliers

Nutrient_Level

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4091634
  • The largest value (0) is over 1.68 times larger than the second largest value (50)

Length

Mean 1.6987
Standard Deviation 0.7819
Median 1
Minimum 1
Maximum 3

Sample

1st row 50
2nd row 0
3rd row 50
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 104209
  • The top 2 categories (0, 50) take over 50.0%
  • The largest value (0) is over 1.68 times larger than the second largest value (50)

Water_Quality

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770
  • The largest value (0) is over 2.38 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 2.38 times larger than the second largest value (1)
  • Water_Quality has words of constant length

Air_Quality_Index

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 100.1142
Minimum 13.6107
Maximum 185.7979
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Air_Quality_Index is skewed left (γ1 = -0.0004)

Quantile Statistics

Minimum 13.6107
5-th Percentile 67.3157
Q1 86.6758
Median 100.1732
Q3 113.5737
95-th Percentile 132.8087
Maximum 185.7979
Range 172.1872
IQR 26.8979

Descriptive Statistics

Mean 100.1142
Standard Deviation 19.9284
Variance 397.141
Sum 6.1415e+06
Skewness -0.00035961
Kurtosis 0.009365
Coefficient of Variation 0.1991
  • Air_Quality_Index has 428 outliers

Soil_pH

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 6.749
Minimum 5
Maximum 8.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Soil_pH is normally distributed
  • Soil_pH is skewed right (γ1 = 0.0053)

Quantile Statistics

Minimum 5
5-th Percentile 5.1735
Q1 5.8701
Median 6.7475
Q3 7.6311
95-th Percentile 8.3336
Maximum 8.5
Range 3.4999
IQR 1.7611

Descriptive Statistics

Mean 6.749
Standard Deviation 1.0137
Variance 1.0276
Sum 414015.1655
Skewness 0.005336
Kurtosis -1.2047
Coefficient of Variation 0.1502

Dissolved_Oxygen

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 6.994
Minimum -0.8577
Maximum 14.3564
Zeros 0
Zeros (%) 0.0%
Negatives 11
Negatives (%) 0.0%
  • Dissolved_Oxygen is skewed left (γ1 = -0.016)

Quantile Statistics

Minimum -0.8577
5-th Percentile 3.6972
Q1 5.6388
Median 6.9937
Q3 8.3533
95-th Percentile 10.2896
Maximum 14.3564
Range 15.214
IQR 2.7145

Descriptive Statistics

Mean 6.994
Standard Deviation 2.0019
Variance 4.0075
Sum 429044.0258
Skewness -0.01595
Kurtosis -0.04003
Coefficient of Variation 0.2862
  • Dissolved_Oxygen is not normally distributed (p-value 0.004154162679227936)
  • Dissolved_Oxygen has 399 outliers

Chemical_Oxygen_Demand

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 149.8976
Minimum 0.005917
Maximum 299.9976
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Chemical_Oxygen_Demand is skewed right (γ1 = 0.0063)

Quantile Statistics

Minimum 0.005917
5-th Percentile 15.5036
Q1 74.7472
Median 149.7599
Q3 224.8194
95-th Percentile 285.0386
Maximum 299.9976
Range 299.9917
IQR 150.0722

Descriptive Statistics

Mean 149.8976
Standard Deviation 86.458
Variance 7474.988
Sum 9.1955e+06
Skewness 0.006279
Kurtosis -1.1973
Coefficient of Variation 0.5768

Biochemical_Oxygen_Demand

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 99.7863
Minimum 0.002086
Maximum 199.9973
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Biochemical_Oxygen_Demand is skewed right (γ1 = 0.0077)

Quantile Statistics

Minimum 0.002086
5-th Percentile 10.1095
Q1 49.6996
Median 99.3308
Q3 149.829
95-th Percentile 190.1454
Maximum 199.9973
Range 199.9952
IQR 100.1294

Descriptive Statistics

Mean 99.7863
Standard Deviation 57.7947
Variance 3340.2231
Sum 6.1214e+06
Skewness 0.007704
Kurtosis -1.2014
Coefficient of Variation 0.5792

Total_Dissolved_Solids

numerical

Approximate Distinct Count 61345
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 981520
Mean 249.6205
Minimum 0.002559
Maximum 499.9846
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Total_Dissolved_Solids is skewed right (γ1 = 0.0023)

Quantile Statistics

Minimum 0.002559
5-th Percentile 24.5724
Q1 124.6393
Median 250.0119
Q3 374.1801
95-th Percentile 474.7917
Maximum 499.9846
Range 499.9821
IQR 249.5409

Descriptive Statistics

Mean 249.6205
Standard Deviation 144.3334
Variance 20832.1392
Sum 1.5313e+07
Skewness 0.002287
Kurtosis -1.1998
Coefficient of Variation 0.5782

Ecological_Health_Label

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5208193
  • The largest value (Ecologically Healthy) is over 1.67 times larger than the second largest value (Ecologically Stable)

Length

Mean 19.9
Standard Deviation 0.6993
Median 20
Minimum 19
Maximum 21

Sample

1st row Ecologically Healt...
2nd row Ecologically Stabl...
3rd row Ecologically Healt...
4th row Ecologically Healt...
5th row Ecologically Criti...

Letter

Count 1159423
Lowercase Letter 1036733
Space Separator 61345
Uppercase Letter 122690
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Ecologically Healthy, Ecologically Stable) take over 50.0%
  • The largest value (ecologically) is over 2.0 times larger than the second largest value (healthy)

Ecological_Health_Label_Ecologically Critical

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770
  • The largest value (0) is over 19.53 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 19.53 times larger than the second largest value (1)
  • Ecological_Health_Label_Ecologically Critical has words of constant length

Ecological_Health_Label_Ecologically Degraded

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770
  • The largest value (0) is over 5.63 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 5.63 times larger than the second largest value (1)
  • Ecological_Health_Label_Ecologically Degraded has words of constant length

Ecological_Health_Label_Ecologically Healthy

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 1
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (1, 0) take over 50.0%
  • Ecological_Health_Label_Ecologically Healthy has words of constant length

Ecological_Health_Label_Ecologically Stable

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770
  • The largest value (0) is over 2.34 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 2.34 times larger than the second largest value (1)
  • Ecological_Health_Label_Ecologically Stable has words of constant length

Ecological_Health_Label_encoded

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 4048770
  • The largest value (2) is over 1.67 times larger than the second largest value (3)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 2
2nd row 3
3rd row 2
4th row 2
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 61345
  • The top 2 categories (2, 3) take over 50.0%
  • The largest value (2) is over 1.67 times larger than the second largest value (3)
  • Ecological_Health_Label_encoded has words of constant length

Interactions

Correlations

Missing Values